CMSC 408 Deliverable 8 Database Design

Author

Ashwath Balaji and Carissa Trieu

Published

December 10, 2024

Financial Data Manager Project

Project Overview and Key Deliverables

The Financial Data Manager project is a project created by two Computer Science majors also pursuing minors in General Business. The purpose of this project is to aid investors in making the best investment choices in the stock market. Our database provides the necessary financial data that an investor can find highly useful to evaluate a company’s financial health and wealth, and we believe that a database will not only reduce time, but also get future investors a chance to invest utilizing a wise tool to aid them :).

URL’s

Problem Description

  • Problem domain: Creating a database with this data would address a problem faced by investors and financial analysts. Companies can have different means of reporting which makes it inefficient and time-consuming to look at. Specifically, finding and retrieving the exact information is a waste of time for investors who want to get their data as soon as possible to make investment decisions, especially with the fast-paced and changing nature of the markets. For example, I had to look for information on multiple website wasting around 3-4 hours of time just searching for the most accurate information in order to make stock pitches for the Student Managed Investment Portfolio (SMIP) which is an investment club at VCU.

  • Need: A database is needed to store and organize financial information and is useful for quicker decision-making processes and better management of all the existing data.

  • Context, scope, and perspective: The database is useful for investors and for those who want to get experience reading financial statements to make investment decisions. It takes a long time to look for accurate information about a company and we thought it would be better to have all that important information in one place for users to access that data. For example, if I wanted to decide whether or not to invest in Nvidia, I would first take a look at financial statements to see the financial health of the company to see how they’re doing and then decide whether or not the stock would be a buy, sell, or hold.

  • User roles and use cases: The consumer target is primarily investors in the stock market whether it is a millionaire or a novice who is just getting started. These are the people who will mostly be using our database to fetch financial data for a company they wish to know more about. For example, if an investor wants to get the earningsfor a company like Apple, they can use the database to search the company and find the earnings at a specific point of time. If an investor decides whether or not buying Apple stock would be a good decision, they can use earnings as one of the indicators to make that decision utilizing the database with the data we provide. The database will be use an API that will fetch data from a trusted source to store it in the database.

  • Security and Privacy: The only security concern that we could think of is whether or not the information from the source we get it from is accurate. Another issue could be that in case there is a data breach and all information is lost, then it will be harder for our customers to gain access to essential information.

Database Design

  • Company: represents a business organization and has a unique ID.
  • Stock: represents stock information and has a unique stock ticker.
  • Income_Statement: represents financial performance and has a unique statement ID.
  • Balance_Sheet: represents a company’s financial position and has a unique statement ID.
  • Cash_Flow: represents cash inflows and outflows and has a unique statement ID.
  • Earnings: represents financial earnings and has a unique earnings ID.

Entity-relationship diagrams: (Note: We made the attributes shorter to make the diagram more readable.)

ER Company Company publishes publishes Company--publishes 1 issues issues Company--issues 1 records records Company--records 1 flows flows Company--flows 1 reports reports Company--reports 1 Income_Stmt Income_Stmt Stock Stock Balance_Sheet Balance_Sheet Cash_Flow Cash_Flow Earnings Earnings name name name--Company sector sector sector--Company industry industry industry--Company ticker ticker ticker--Company ticker--Income_Stmt ticker--Stock ticker--Balance_Sheet ticker--Cash_Flow ticker--Earnings IPO_date IPO_date IPO_date--Stock price price price--Stock stmt_id1 stmt_id1 stmt_id1--Income_Stmt stmt_id2 stmt_id2 stmt_id2--Balance_Sheet stmt_id3 stmt_id3 stmt_id3--Cash_Flow fiscal_end fiscal_end fiscal_end--Income_Stmt fiscal_end--Balance_Sheet fiscal_end--Cash_Flow rev rev rev--Income_Stmt cost_rev cost_rev cost_rev--Income_Stmt gross gross gross--Income_Stmt op_inc op_inc op_inc--Income_Stmt net_inc net_inc net_inc--Income_Stmt inc_before_tax inc_before_tax inc_before_tax--Income_Stmt ebit ebit ebit--Income_Stmt ebitda ebitda ebitda--Income_Stmt assets assets assets--Balance_Sheet curr_assets curr_assets curr_assets--Balance_Sheet cash_eq cash_eq cash_eq--Balance_Sheet acc_rec acc_rec acc_rec--Balance_Sheet debt debt debt--Balance_Sheet lt_debt lt_debt lt_debt--Balance_Sheet tang_assets tang_assets tang_assets--Balance_Sheet wc wc wc--Balance_Sheet invest_cap invest_cap invest_cap--Balance_Sheet tang_bv tang_bv tang_bv--Balance_Sheet cap cap cap--Balance_Sheet shares shares shares--Balance_Sheet equity equity equity--Balance_Sheet retained retained retained--Balance_Sheet com_eq com_eq com_eq--Balance_Sheet op_cf op_cf op_cf--Cash_Flow cap_exp cap_exp cap_exp--Cash_Flow fcf fcf fcf--Cash_Flow div_paid div_paid div_paid--Cash_Flow earnings_id earnings_id earnings_id--Earnings fiscal_end_date fiscal_end_date fiscal_end_date--Earnings eps eps eps--Earnings publishes--Income_Stmt N issues--Stock N records--Balance_Sheet N flows--Cash_Flow N reports--Earnings N
Figure 1: Financial Data Manager in Chen Notation

erDiagram
    COMPANY {
        string ticker
        string name
        string sector
        string industry
    }

    INCOME_STATEMENT {
        int statement_id1
        string ticker
        date fiscal_year_end
        float total_revenue
        float cost_of_revenue
        float gross_profit
        float operating_income
        float Net_income_common_stockholders
        float income_before_tax
        float ebit
        float ebitda
    }

    STOCK {
        string ticker
        int company_id
        float price
        date IPO_date
    }

    BALANCE_SHEET {
        string ticker
        int statement_id2
        date fiscal_year_end
        float total_assets
        float current_assets
        float cash
        float cash_and_cash_equivalents
        float accounts_receivables 
        float total_debt
        float net_tangible_assets
        float long_term_debt
        float working_capital
        float invested_capital
        float tangible_book_value
        float total_capitalization
        int shares_issued
        float stockholders_equity
        float retained_earnings
        float common_stock_equity
    }

    CASH_FLOW {
        int statement_id3
        string ticker
        date fiscal_year_end
        float operating_cashflow
        float capital_expenditures
        float free_cash_flow
        float cash_dividends_paid
    }

    EARNINGS {
        int earnings_id
        string ticker
        int earnings_id
        int company_id
        date fiscal_date_end
        float reported_eps
    }

    COMPANY ||--o{ INCOME_STATEMENT : publishes
    COMPANY ||--o{ STOCK : issues
    COMPANY ||--o{ BALANCE_SHEET : records
    COMPANY ||--o{ CASH_FLOW : flows
    COMPANY ||--o{ EARNINGS : reports

Relational Schemas

Note that PK is (Primary Key) and FK is (Foreign Key)

  1. Company(ticker (PK, string), name (string), sector (string), industry (string))
    • This relation simply associates a company with the name, industry, and ticker through the company ID.
  2. Stock(companyID (int), ticker (FK, string), price (float), IPO_date (date))
    • This relation associates a stock with the price, IPO_date, companyID through the stockID.
  3. Income_Statement(statement_id (PK, int), fiscal_year_end (date), total_revenue (float), cost_of_revenue (float), gross_profit (float), operating_income (float), Net_income_common_stockholders (float), pretax_income (float), ebit (float), ebitda (float), ticker (FK, string))
    • This relation stores details of a income statement for a company.
  4. Balance_Sheet(statement_id (PK, int), ticker (FK, string), fiscal_year_end (date), total_assets (float), current_assets (float), cash_and_cash_equivalents (float), accounts_receivable (float), total_debt (float), long_term_debt (float), net_tangible_assets (float), working_capital (float), invested_capital (float), tangible_book_value (float), total_capitalization (float), shares_issued (int), stockholders_equity (float), retained_earnings (float), common_stock_equity (float))
    • This relation stores details of a balance sheet for a company.
  5. Cash_Flow(statement_id (PK, int), ticker (FK, string), fiscal_year_end (date), operating_cashflow (float), capital_expenditures (float), free_cash_flow (float), cash_dividends (float))
    • This relation stores the cash flow data for a company.
  6. Earnings(earnings_id (PK, int), ticker (FK, string), fiscal_date_end (date), reported_eps (float))
    • This relation stores earnings data for a company like reported earnings per share (EPS).

Functional Dependencies and Normalization

Functional Dependencies

  • Company: PK - ticker, FDs ticker → name, sector, industry
  • Income_Statement: PK - statementID, FDs statementID → fiscal_year_end, total_revenue, cost_of_revenue, gross_profit, operating_income, Net_income_common_stockholders, income_before_tax, ebit, ebitda, ticker
  • Stock: PK - company_id, FDs - company_id → eod_price, IPO_date, ticker
  • Balance_Sheet: PK - statement_id, FDs statement_id → ticker, fiscal_year_end, total_assets, current_assets, cash_and_cash_equivalents, accounts_receivable, total_debt, long_term_debt, net_tangible_assets, working_capital, invested_capital, tangible_book_value, total_capitalization, shares_issued, stockholders_equity, retained_earnings, common_stock_equity
  • Cash_Flow: PK - statement_id, FDs statement_id → ticker, fiscal_year_end, operating_cashflow, capital_expenditures, free_cash_flow, cash_dividends
  • Earnings: PK - earnings_id, FDs earnings_id → ticker, fiscal_date_end, reported_eps

Normalization

  • Company: This relation is already in BCNF. There is one candidate key.
  • Income_Statement: This relation is already in BCNF. There is one candidate key.
  • Stock: This relation is already in BCNF. There is one candidate key.
  • Balance_Sheet: This relation is already in BCNF. There is one candidate key.
  • Cash_Flow: This relation is already in BCNF. There is one candidate key.
  • Earnings: This relation is already in BCNF. There is one candidate key.

Specific Queries

  1. List all companies and their stock tickers.

Relational Algebra:
\pi_{\text{Name, Ticker}}(\text{Company})


  1. Find all companies from the Technology sector.

Relational Algebra:
\sigma_{\text{Sector='Technology'}}(\text{Company})


  1. Get the stock ticker and price for Apple.

Relational Algebra:
\sigma_{\text{Stock\_ticker='AAPL'}}(\text{Stock})


  1. Find the companies with IPO dates after 2010

Relational Algebra:
\pi_{\text{Name}}(\sigma_{\text{IPO\_date} > '2010-01-01'}(\text{Company}))


  1. Retrieve the total revenue and net income for Apple

Relational Algebra:
\pi_{\text{total\_revenue}, \text{net\_income}}(\sigma_{\text{Stock\_ticker} = 'AAPL'}(\text{Income\_Statement}))


  1. Get the total debt for all companies in the Consumer Cyclical sector with fiscal year end

Relational Algebra:
\pi_{\text{Name}, \text{total\_debt}}(\sigma_{\text{sector} = 'Consumer Cyclical'}(\text{Balance\_Sheet}))


  1. List all companies with a gross profit greater than 50 million

Relational Algebra:
\pi_{\text{Name}}(\sigma_{\text{gross\_profit} > 50000000}(\text{Income\_Statement}))


  1. Find the companies with total assets greater than 1 billion

Relational Algebra:
\pi_{\text{Name}}(\sigma_{\text{total\_assets} > 1000000000}(\text{Balance\_Sheet}))


  1. Get the fiscal year-end date for Microsoft from 2021 to 2024

Relational Algebra:
\pi_{\text{fiscal\_year\_end}}(\sigma_{\text{Name} = 'Microsoft'}(\text{Stock} \bowtie \text{Company}))


  1. Find all companies that have a working capital greater than 50 million

Relational Algebra:
\pi_{\text{Name}}(\sigma_{\text{working\_capital} > 50000000}(\text{Balance\_Sheet}))


  1. Get the names of companies in the ‘Software Infrastructure’ industry

Relational Algebra:
\pi_{\text{Name}}(\sigma_{\text{industry} = 'Software Infrastructure'}(\text{Company}))


  1. Retrieve the total revenue and cost of revenue for Tesla

Relational Algebra:
\pi_{\text{total\_revenue}, \text{cost\_of\_revenue}}(\sigma_{\text{Name} = 'Tesla'}(\text{Income\_Statement}))


  1. Find companies that have both net income for common stockholders greater than 10 million.

Relational Algebra:
\pi_{\text{Name}}(\sigma_{\text{net\_income} > 10000000}(\text{Income\_Statement}))


  1. Retrieve the operating cashflow for Nvidia.

Relational Algebra:
\pi_{\text{Operating\_cashflow}}(\sigma_{\text{Stock\_ticker} = \text{'NVDA'}}(\text{Cashflow}))


  1. Retreive the earnings for all tickers in the fiscal date end of 2022-12-31

Relational Algebra:
\pi_{\text{Ticker}, \text{Earnings}}(\sigma_{\text{Fiscal\_year\_end} = \text{'2022-12-31'}}(\text{Income\_Statement} \bowtie \text{Ticker}))


  1. Get the capital expenditures for companies which are greater than -2000000

Relational Algebra:
\pi_{\text{Name}, \text{Capital\_expenditures}}(\sigma_{\text{Capital\_expenditures} > -2000000}(\text{Cashflow}))


  1. Retrieve the stock ticker with fiscal year and cash dividends paid which are not null

Relational Algebra:
\pi_{\text{Stock\_ticker}, \text{Fiscal\_year\_end}, \text{Cash\_dividends\_paid}}(\sigma_{\text{Cash\_dividends\_paid} \neq \text{NULL}}(\text{Cash\_Flow}))


  1. Retrieve the stock ticker and free cash flow where free cash flow is greater than 50,000,000

Relational Algebra:
\pi_{\text{Stock\_ticker}, \text{Free\_cash\_flow}}(\sigma_{\text{Free\_cash\_flow} > 50000000}(\text{Cash\_Flow}))


  1. Retrieve the fiscal year and operating cash flow for records where operating cash flow is less than 10,000,000.

Relational Algebra:
\pi_{\text{Fiscal\_year\_end}, \text{Operating\_cashflow}}(\sigma_{\text{Operating\_cashflow} < 10000000}(\text{Cash\_Flow}))


  1. Retrieve the stock ticker, fiscal year, and capital expenditures for records where capital expenditures are negative.

Relational Algebra:
\pi_{\text{Stock\_ticker}, \text{Fiscal\_year\_end}, \text{Capital\_expenditures}}(\sigma_{\text{Capital\_expenditures} < 0}(\text{Cash\_Flow}))

Sample Data

  1. Company
name sector industry ticker (PK)
Apple Inc. Technology Electronics AAPL
Microsoft Technology Software MSFT
Tesla Automotive Internet TSLA
JPMorgan Financial Bank JPM
Pfizer Healthcare Healthcare PFE
  1. Stock
company_id stock_ticker price IPO_date ticker (FK)
1 1001 180.50 2023-01-01 AAPL
2 1002 320.12 2023-02-01 MSFT
3 1003 250.75 2023-03-01 TSLA
4 1004 140.88 2023-04-01 JPM
5 1005 39.25 2023-05-01 PFE
  1. Income_Statement
statement_id fiscal_year_end total_revenue cost_of_revenue gross_profit operating_income Net_income_cs income_before_tax ebit ebitda ticker (FK)
2001 2023-12-31 400000000000 200000000000 200000000000 120000000000 95000000000 110000000000 120000000 150000000 AAPL
2002 2023-12-31 250000000000 80000000000 170000000000 90000000000 75000000000 85000000000 90000000 110000000 MSFT
2003 2023-12-31 100000000000 70000000000 30000000000 15000000000 12000000000 14000000000 15000000 20000000 TSLA
2004 2023-12-31 50000000000 30000000000 20000000000 10000000000 8000000000 9000000000 10000000 13000000 JPM
2005 2023-12-31 25000000000 15000000000 10000000000 6000000000 5000000000 5500000000 6000000 8000000 PFE
  1. Balance_Sheet
statement_id fiscal_year_end total_assets current_assets cash cash_equivalents total_debt shares_issued current_debt long_term_debt retained_earnings common_stock_equity ticker (FK)
3001 2023-12-31 500000000000 200000000000 300000000000 50000000000 200000000 15000000000 20000000000 50000000000 250000000000 500000000 AAPL
3002 2023-12-31 400000000000 150000000000 250000000000 40000000000 150000000 14000000000 15000000000 40000000000 200000000000 400000000 MSFT
3003 2023-12-31 300000000000 120000000000 180000000000 30000000000 100000000 13000000000 10000000000 30000000000 150000000000 300000000 TSLA
3004 2023-12-31 200000000000 100000000000 100000000000 20000000000 80000000 12000000000 5000000000 20000000000 75000000000 200000000 JPM
3005 2023-12-31 150000000000 50000000000 100000000000 10000000000 50000000 11000000000 3000000000 10000000000 50000000000 100000000 PFE
  1. Cash_Flow
statement_id fiscal_year_end operating_cashflow capital_expenditures free_cash_flow cash_dividends ticker (FK)
4001 2023-12-31 120000000000 50000000000 30000000000 20000000000 AAPL
4002 2023-12-31 100000000000 40000000000 25000000000 15000000000 MSFT
4003 2023-12-31 80000000000 30000000000 20000000000 10000000000 TSLA
4004 2023-12-31 60000000000 20000000000 15000000000 8000000000 JPM
4005 2023-12-31 40000000000 10000000000 10000000000 5000000000 PFE
  1. Earnings
earnings_id fiscal_date_end reported_eps ticker (FK)
5001 2023-12-31 6.20 AAPL
5002 2023-12-31 5.50 MSFT
5003 2023-12-31 4.80 TSLA
5004 2023-12-31 3.10 JPM
5005 2023-12-31 2.50 PFE

Project Management

Draft Project Schedule

gantt
    title Project Timeline
    dateFormat  YYYY-MM-DD
    axisFormat  %b %d
    todayMarker stroke-width:2px,stroke:red,opacity:0.5

    section Planning
    Deliverable 8   :done, des1, 2024-09-15, 2024-10-13
    Website Creation :active, des2, 2024-10-13, 2024-10-20
    SQL Database Implementation :active, des2, 2024-10-20, 2024-11-04
    Query Writing :active, des2, 2024-11-04, 2024-12-02
    Testing and Refinement :active, des2, 2024-12-02, 2024-12-09

    section Milestones
    Deliverable 8             :milestone, m1, 2024-10-13, 1d
    Deliverable 9             :milestone, m2, 2024-10-20, 1d
    Deliverable 12             :milestone, m3, 2024-12-09, 1d